1,435 research outputs found
Cactus: Issues for Sustainable Simulation Software
The Cactus Framework is an open-source, modular, portable programming
environment for the collaborative development and deployment of scientific
applications using high-performance computing. Its roots reach back to 1996 at
the National Center for Supercomputer Applications and the Albert Einstein
Institute in Germany, where its development jumpstarted. Since then, the Cactus
framework has witnessed major changes in hardware infrastructure as well as its
own community. This paper describes its endurance through these past changes
and, drawing upon lessons from its past, also discusses futureComment: submitted to the Workshop on Sustainable Software for Science:
Practice and Experiences 201
Attribute Elicitation: Implications in the Research Context1
Three different methods of attribute elicitation for two different paper-based products were compared in this study. The three methods used were free elicitation (FE), hierarchical dichotomization (HD), and Kelly's repertory grid (RG). The two paper-based products used in this study were bathroom tissue and paper towels. The methods were compared by abstraction, efficiency in data collection, convergent validity, and respondents' reaction to the task. The results from this comparison indicated that the level of abstraction did not significantly differ between methods or products. However, a rank order analysis revealed that a substantial difference existed with 18 to 20% of the attributes being rated significantly different between the elicitation methods for paper towels and bathroom tissue, respectively. Convergent validity was exhibited between all the methods, although was found to be highest between HD and RG. These findings suggest that all three elicitation methods elicit very similar information from the consumers' knowledge base. The efficiency in data collection revealed that for both products FE took significantly less time to complete the task, as well as to elicit the individual attributes. Furthermore, HD was identified as being the least efficient of the methods for either product. For the comparison of the reaction to task, FE was found to be the least difficult of the three methods and also allowed the respondents to more freely express their opinion
Shared memory parallelism in Modern C++ and HPX
Parallel programming remains a daunting challenge, from the struggle to
express a parallel algorithm without cluttering the underlying synchronous
logic, to describing which devices to employ in a calculation, to correctness.
Over the years, numerous solutions have arisen, many of them requiring new
programming languages, extensions to programming languages, or the addition of
pragmas. Support for these various tools and extensions is available to a
varying degree. In recent years, the C++ standards committee has worked to
refine the language features and libraries needed to support parallel
programming on a single computational node. Eventually, all major vendors and
compilers will provide robust and performant implementations of these
standards. Until then, the HPX library and runtime provides cutting edge
implementations of the standards, as well as proposed standards and extensions.
Because of these advances, it is now possible to write high performance
parallel code without custom extensions to C++. We provide an overview of
modern parallel programming in C++, describing the language and library
features, and providing brief examples of how to use them
A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems
Heterogeneous systems are becoming more common on High Performance Computing
(HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task
to obtain optimal performance on the GPU. Approaches to simplifying this task
include Merge (a library based framework for heterogeneous multi-core systems),
Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a
new programming language for general purpose computation on the GPU) and
CUDA-lite (an enhancement to CUDA that transforms code based on annotations).
In addition, efforts are underway to improve compiler tools for automatic
parallelization and optimization of affine loop nests for GPUs and for
automatic translation of OpenMP parallelized codes to CUDA.
In this paper we present an alternative approach: a new computational
framework for the development of massively data parallel scientific codes
applications suitable for use on such petascale/exascale hybrid systems built
upon the highly scalable Cactus framework. As the first non-trivial
demonstration of its usefulness, we successfully developed a new 3D CFD code
that achieves improved performance.Comment: Parallel Computing 2011 (ParCo2011), 30 August -- 2 September 2011,
Ghent, Belgiu
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
Analysis of Weather Data Collected From Two Locations in a Small Urban Community
The heat island effect is a well known feature in the microclimate of urban areas, and is considered to be the difference between the urban area and its surroundings. While this study only employs two instruments, the authors are not aware of any studies which examine the differences in temperature between an instrument inside a town the size of Sedalia and its surroundings by collecting hourly information. We attempt to infer here the impact of Sedalia, Missouri, the State Fair Community College campus, and the state fairgrounds on the temperature patterns for a small region of west-central Missouri. The two stations, one on the grounds of State Fair Community College and the other at the Sedalia Airport were used. Temperature, precipitation, cloudiness, and wind information were gathered hourly between 1 February and 31 March, 2005. The weather station at the regional airport was located 11 km (7 miles) northeast of the campus instrument. Our results indicate that the city has no discernable impact on the distribution of monthly precipitation totals. We found a distinct difference between the local surface temperatures as recorded by each instrument. For the Sedalia area, the temperature differences between the town center and the outside location were approximately 2 - 6oF (1.0 - 3.3o C) warmer, typically, than the surrounding environment, as inferred by these instruments. This difference was as much as 11o F (6oC) when comparing hourly temperature information. Additionally, the difference was larger for clear days and days during which there was little wind
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java
Many scientific high performance codes that simulate e.g. black holes,
coastal waves, climate and weather, etc. rely on block-structured meshes and
use finite differencing methods to iteratively solve the appropriate systems of
differential equations. In this paper we investigate implementations of an
extremely simple simulation of this type using various programming systems and
languages. We focus on a shared memory, parallelized algorithm that simulates a
1D heat diffusion using asynchronous queues for the ghost zone exchange. We
discuss the advantages of the various platforms and explore the performance of
this model code on different computing architectures: Intel, AMD, and ARM64FX.
As a result, Python was the slowest of the set we compared. Java, Go, Swift,
and Julia were the intermediate performers. The higher performing platforms
were C++, Rust, Chapel, Charm++, and HPX
Asynchronous Execution of Python Code on Task Based Runtime Systems
Despite advancements in the areas of parallel and distributed computing, the
complexity of programming on High Performance Computing (HPC) resources has
deterred many domain experts, especially in the areas of machine learning and
artificial intelligence (AI), from utilizing performance benefits of such
systems. Researchers and scientists favor high-productivity languages to avoid
the inconvenience of programming in low-level languages and costs of acquiring
the necessary skills required for programming at this level. In recent years,
Python, with the support of linear algebra libraries like NumPy, has gained
popularity despite facing limitations which prevent this code from distributed
runs. Here we present a solution which maintains both high level programming
abstractions as well as parallel and distributed efficiency. Phylanx, is an
asynchronous array processing toolkit which transforms Python and NumPy
operations into code which can be executed in parallel on HPC resources by
mapping Python and NumPy functions and variables into a dependency tree
executed by HPX, a general purpose, parallel, task-based runtime system written
in C++. Phylanx additionally provides introspection and visualization
capabilities for debugging and performance analysis. We have tested the
foundations of our approach by comparing our implementation of widely used
machine learning algorithms to accepted NumPy standards
- …